Search CORE

60 research outputs found

Language and Dialect Identification of Cuneiform Texts

Author: Alstola Tero
Jauhiainen Heidi
Jauhiainen Tommi
Lindén Krister
Publication venue
Publication date: 01/01/2019
Field of study

This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus. We also describe the CLI dataset and how it was derived from the corpus. In addition, we provide some baseline language identification results using the CLI dataset. To the best of our knowledge, the experiments detailed here are the first time automatic language identification methods have been used on cuneiform data

arXiv.org e-Print Archive

Crossref

From Sherds of Pottery to Open Egyptological Data

Author: Jauhiainen Heidi
Publication venue
Publication date: 02/12/2022
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Encoding Hieroglyphic Texts

Author: Jauhiainen Heidi
Publication venue: CEUR-WS.org
Publication date: 01/01/2022
Field of study

With the help of data science, researchers in the humanities can study large amounts of data at once and find regularities that they might not otherwise detect. In order to use digital methods, the texts to be examined must be in machine-readable form, but the lack of such text corpora hinders the digital study of ancient Egyptian texts. A sign can be next to, above, or over another in a hieroglyphic text, and two or more signs can be nested. Egyptologists use encoding to maintain the information on the signs and their places relative to each other when preparing hieroglyphic texts for publication in printed form. The encoding uses letter-number combinations from the Gardiner list, a standard reference list for Ancient Egyptian hieroglyphs. To increase the number of machine-readable hieroglyphic texts, the plan is to develop a workflow that uses automatic transliteration. This paper aims to present the first steps towards this goal. Ancient Egyptian texts are encoded by hand in JSesh, an open-source hieroglyphic editor. The aim is to publish annotated texts in a structured form, and a tool is being built to turn the binary format files produced in JSesh into machine-readable form. This paper introduces Gly2Mdc version 1.0, which extracts and cleans the encoding from the binary file. The tool is openly available and can be used for files with the extension .gly and containing encoded hieroglyphic text.Peer reviewe

Helsingin yliopiston digitaalinen arkisto